Capstone Research

Tim Holthuijsen

This is the Climate Model Correction Research for my AUC capstone. In this Jupyter Notebook, I will perform multiple types of error analysis on Climate Model Outputs in order to quanity and recognize patterns in climate model error. I will then use a combination of Machine Learning, Deep Learning and statistical techniques to predict and reduce these patterns of Climate Model errors, yielding an Algorithm for Climate Model Improvement. This method can be reproduced with any other climate model outputs, and has proven to significantly improve the accuracy of the Global Ensemble Forecast System model data: reducing its inaccuracy by more than 83%.

Imports

We can begin this analysis after loading the reforecast temperature dataset into the reforecast variable

Setting the lons, lats, and temp variables for the dataset

Now we can Plot the temperature in the Reforecast dataset along its respective longtitude and latitude, using the Cartopy package on a Plate Carrée map transform.

Reanalysis

We can now repeat this process to get the Reanalysis data mapped out:

Variating over Time

We now make an attempt to variate the temperature map over time, showing global temeprature at the second timeslot in the dataset

Converting this to actual dates leads us to:

We now display the second timeslot, and check whether the temperature map is indeed different from the one a month earlier. The map does indeed show slight differences in the temperature distribution, showing that we have succesfully taken a step in time across the dataframe

Data Processing

As becomes apparent from the differences in these two figures, the reforecast and reanalysis datasets show significantly differing temperatue patterns. In order to compare and quantify the differences in these patterns, we will subtract the Reforecast dataset from the Reanalsysi dataset in order to get the model errors. In order to do this, we first have to get both datasets into the same dimensions, regarding both time and spatial resolutions. We will get into converting these dimensions now:

From this initial comparison between the pictures, it already becomes apparent that the datasets seem rather different. However, at the moment, a large part of this difference can also be attributed to the fact that the datasets are not displaying the global temperature at exactly the same time. In order to get these maps to show the same point in time, we have to get the time variable sorted out:

So as we can see above, the reforecast's time variable is set in number of hourse since 1800-01-01, while the reanalysis shows number of hours since 1900-01-01. To get these into the same format, some converting will need to be done:

There is exactly a 100-year difference between the two datasets time units. We can use this to our advantage in converting them:

And then the extent should line up more nicely:

As shown above, the datasets overlap for the largest part now, which is exactly as was to be expected. The overlap is not perfect because the reforecast dataset goes from 1-6-1985 until 16-9-1985 while the reanalysis dataset goes from 1-1-1980 until 1-8-2020. We will therefore only use the overlapping timeslot for our analysis: from 1-6-1985 until 1-8-2020. Or, in Netcdf time units: from 1625304 until 1933608

Right now, the reanalysis dataset starts at an earlier point in time, and the reforecast ends at a later one. To get them both in the same shape, we start cropping the respective time datasets:

Now, both time datasets span exactly the same space in time: from 1-1-1980 until 1-8-2020. However, the reforecast still has a much higher temporal resolution, as seen above. To get them in the same resolution, we reduce the reforecast's resolution to match that of the reanalysis:

We create a list x, which is True for every time index in reforecast that we want to keep (422), and False for each one we want to throw away

Data organization

Because we have transformed and converted all datasets a lot, we do a quick reorganization to get all of the data we will be working with in the same nameformat. Additionally, the reanalysis dataset has a 4 times higher spatial resolution than the reforecast data, so we reduce the reanalysis resolution to match the reforecast exactly.

Systematic Comparison

More than 27 million datapoints!

Since the temporal and spatial resoltuions of the datasets were significantly different, a lot of data was lost while cropping the resolutions to match each other. However, we are still left with 181 x 360 = 65,160 datapoints per map, over 422 different moments in time, resulting in a total of 181 x 360 x 422 = 27,497,520 datapoints over which we can perform our analysis. This is a very large training set, and should be more than enough to train any machine learning model to predict the climate model inaccuracy.

Now that we have transformed the reanalysis and reforecast datasets to match each other exactly, it should be convenient to compare them and quantify their discrepancies

Now that we got the reanalysis and reforecast in the same resolution, we can calculate the model error by subtracting the reforecast from the reanalysis array. The value that this yields is the difference between the reforecasted model predictions by the GEFS and the Reanalysis temperature values by ERA5. This is a crude definition of model error, since the reanalysis itself may contain unaccounted model Bias, which could dismantle the validity of this comparison. Therefore we will compare the error data created by our initial error estimation with a more sophisticated error estimation which is based on observational data later in this development. If the initial and observational errors appear relatively close to one another we will use the initial error estimation for our analysis since it contains a vast amount of datapoints. If the two error estimations diverge significantly however, we will use an error estimation based on purely observational data instead. But first, we define and calculate our total error estimation:

Climate model error

Then, we can display the climate model error on a global scale as shown below. The red places indicate that the model prediction was too low, and the blue places indicate the model prediction was too high. Very interesting patterns can be discerned from this map, which will be investigated further in the paper.

As discussed above, the two maps contain 422 overlapping moments in time. The climate model error for each of these 422 moments differs significantly, and plotting more maps will show widely varying results. To give an indication of this, we show 10 different error maps, equally distributed between 1985 and 2020:

Averaging Error

As can be seen from the maps above, the climate model error varies wildly across different timeslots: not one of the error maps looks the same. While this is good for training our model, it makes it a lot harder to discern model error trends with the naked eye, since you don't know which type of error map is the most common. Therefore, we average all of the model errors from 1985 until 2020, and create one, all-encompassing averaged error map:

Observational Dataset Validation

In order to check whether our error data doesn't come from the bias in the reanalysis, an observational in-situ temperature dataset is used to validate the accuracy of our error estimation. The observational dataset we use for this is the E-OBS daily gridded meteorological data for Europe from 1950 to present derived from in-situ observations

Now, we once again have to make sure that the datasets are shown at the same moment in time, in order to compare them:

We write some code that gives us all the datetimes used by the error dataset:

And then we create a function that converts the validation set times to this time:

Then, we list all of the in-situ time moments that overlap with the error dataset times:

Which allows us to filter the observations times to only the one's we use

Spatial resolution

Now that the temporal resolution matches, we do the same with the spatial resolution

We figure out which indexes we want to use from the observational longitude and latitude

Filtering based on chosen indexes

We have now succesfully transformed the validation dataset into a shape comparable with the other datasets!

Cropping model data

Since the validation dataset only shows Europe, whereas the reanalysis and reforecast datasets are on a global scale, we have to clip these datasets to match the extent of the in-situ observations

However, the longitude/latitude units of the model data are different from the one's in the validation set, and do not go into the negatives. Therefore, we will crop the -24 out of the validation set:

Data organization

Plotting the validation and reforecast datasets as they are in their cropped states:

Comparing Validation and Model Data

Now that everything is in the same shape, we can compare the validation data with both the reforecast and reanalysis datasets, to confirm that the reanalysis is indeed more accurate than the reforecast. We write a function that calculates the Root Mean Square Error (RMSE) of both the Reanalysis and Reforecast datasets, and check their difference

Result: The Reanalysis is way more accurate than the Reforecast!

The Reanalysis dataset has an RMSError of only 0.387 on the observational dataset, whereas the Reforecast has an RMSError of 3.23. This proves that the Reanalysis data is indeed a lot more accurate than the reforecast, and that we can keep using it as an authoritative dataset whose accuracy our final model will strive to achieve. If we can get the accuracy of our predictions of the future close to the accuracy of our predictions of the past (reanalysis), this would be an incredible feat and significantly reduce the error measure in contemporary climate models.

How accurate are the model predictions if we take Reanalysis data as baseline?

Instead of taking the absolutely true observation dataset as baseline, we now want to check what the RMSE of the reforecast would be if we take the Reanalysis dataset as the validation baseline. Therefore, we define a more general RMSE function that we can keep using to test different models, and start it off by testing the initial Reforecast RMSE

The fact that the RMSE of the reforecast on the reanalysis is lower than the RMSE on the observations is an interesting result, which will be discussed further in the accompanying paper.

This is what we will try to improve and reduce by using machine learning on the reforecast dataset

Train/Test Split

In order to train and validate our data, we split it up in a training and a test set, allocating 70% of our data to the training set, and 30% to the test set. We want to keep the individual climate maps together, so we randomly take 70% of our 422 individual moments in time and allocate that as the training set.

Now that we have the data we are going to work with set up, we determine the baseline RMSE that our training set has. The goal is to reduce this RMSE

Adding more training data

With just the Reforecast temperature data, it will be hard to train the reforecast to be much better. Therefore, we add some additional data that we can train with. I have created a Global Landuse Classification map using Supervised image classification in ArcGIS. We will now load this Landuse NetCDF map, in addition to more climate data reforecasted by the climate model: cloud cover and precipitation.

We filter the precipitation Dataset time to only the moments we use

We create one huge CSV file in which we will input all of our training data:

Setting up a general error adjustment

Our initial approach will be to calculate the average model error, and then to remove this error from the test set and see whether its accuracy has improved:

This very simple form of error reduction by simply removing the average error of the training set from the test set already improves model performance, reducing RMSE from 2.9 to 2.5. While this is only a small improvement, it already shows the power and significance of our error adjustment. Even though an RMSE reduction of 0.4 seems small, this reduction was measured over more than 7 million datapoints, and is very significant.

Training

We now load our MegaTable.csv training set to start training models that can improve climate model predictions even further

Create a train/test split for this dataset

We first use a linear regression model:

From this, it becomes apparent that Landuse type seems very important in predicting climate model inaccuracy

Success

The Linear regression model is once again a significant improvement over the original Climate model accuracy, reducing RMSE from 2.9 to below 2.5. However, we want to see if we can do even better than this:

The statistical models above yield promising results and definitely increase climate model accuracy, but they may not contain sufficiently complex algorithms to completly understand the patterns of climate model inaccuracy. To train more sophisticated models, we will start training Neural Networks to predict climate model inaccuracy, and see how well they perform. We initially start with the Sklearn MLPRegressor:

this Multi-layer Perceptron regressor reaches an RMSE score of 2.18, which is better than any model we have seen before, and significantly better than the Reforecast's original RMSE of 2.90. I believe a more advanced neural network could potentially do better than this, so I will attempt to train a custom neural network later on. But for now, I think we can still do a lot better with a more complex machine learning approach, such as the RandomForestRegressor: However, I still believe we can do better than this, so we try another complex machine learning approach in the usage of a RandomForestRegressor:

The Random Forest Regression seems promising, and has reduced the RMSE more than other statistical model so far. However, to prevent training times of literally 2 hours, I have had to keep the model relatively simple (max_depth=2). To see if we can improve this by making it more complex, we do some hyperparameter tuning:

Wow! That is a really good RMSE score

Random Forest is incredibly skillfull!

With the Random Forest adjustment, the climate model data only has an RMSE of 0.47, which is an 84% improvement over its original RMSE of 2.90, and thus makes the climate model significantly better at predicting global temperatures

The inaccuracy predicted by our model and the actual inaccuracy show a very strong correlation, as shown above

It appears that the temperature, location and time are rather important in predicting model inaccuracy, and that landuse type has less influence than expected.

Now that we have our RandomForest model with a ridiculously high accuracy, we attempt to visualize its decision tree to get a better idea of how it works. It is impossible to visualize the entire 250 estimators of the RandomForestModel in one graph, so instead we visualize a more simple model with max_depth=3 to give an indication of how the correction model's decision tree functions

We also visualize the Random Forest Decision Tree, but this one looks a bit less human-comprehensible

Best Model!

It seems that the RandomForestRegressor is the best model we can obtain. The output algorithm of this model is stored in RandomForestModel, and can be remade to tailor corrections specifically for any climate model by changing the ModelData variable all the way at the start of this file. The data inserted into ModelData needs to be a Climate Data NetCDF file with shape [422,360,180] across the dimensions (time,longitude,latitude)

For this specific Climate Model Dataset from the Global Ensemble Forecast System, our correction model has improved its accuracy from 2.90 RMSE to 0.47 RMSE, an accuracy improvement of more than 83%!

Custom Neural Network

To attempt an even more complex solution than the RF network, we attempt to build a custom neural network and teach it to recognize and predict climate model error:

Initially, the model starts off with a rather high Loss (rmse) function, of around 2.53. However, after training the model, this rmse quickly goes down, already dropping to 2.45 after 200 epochs. Therefore, we attempt training the model over more epochs.

We try building the same model with 20.000 training epochs, which leads to an rsme of 2.33, as seen below:

Tuning all of the parameters and architecture some more:

After a while, our RMSE function begins returning NaN as a loss, so instead we use Keras' mean_absolute_error function, as this one stays consistent across the entire training period, and can thus be trained for a larger amount of epochs:

after 10.000 epochs of training, the neural network reaches an MAE of 2.25, which is better than the statistical approaches, but worse than the other deep learning methods. The initial SKlearn Mutli-Layer Perceptron (MLP) seems to be a more effective neural network, with a RMSE of 2.19.

And visualize the Neural Network:

And that's it. We have now trained multiple machine learning modles that can predict and reduce climate model inaccuracy, with varying levels of success. Specifically the large Random Forest Regressor model seems to be very effective at predicting climate model error, and reduces this error from an RMSE of 2.9 to an RMSE of 0.48